Prompt Security And Validation
This document focuses on prompt security and injection prevention mechanisms implemented in the codebase. It explains how the system detects and mitigates prompt injection risks, sanitizes inputs for agents and external data sources, and enforces guardrails to prevent prompt poisoning, jailbreaking attempts, and malicious input exploitation. It also documents validation rules, logging and error handling for suspicious inputs, and secure prompt construction guidelines, along with integration points across the backend and agent workflows.
Security-relevant components are organized across prompts, services, routers, agents, utilities, and core infrastructure:
Prompts define the instruction templates used to detect prompt injection.
Services orchestrate validation and agent generation, including logging and error handling.
Routers expose endpoints for validation and agent execution.
Agents encapsulate the reasoning graph and message normalization.
Utilities provide sanitization for JSON action plans and legacy JS validation.
Core modules manage LLM providers and logging configuration.
Diagram sources
Section sources
Prompt Injection Validator: A dedicated prompt template instructs the LLM to classify website content as safe or unsafe for prompt injection.
Website Validator Service: Converts HTML to Markdown, constructs a validation chain, and interprets LLM output to produce a safety decision.
Agent Sanitizer: Validates and sanitizes JSON action plans produced by the agent to prevent unsafe browser actions and script execution.
React Agent and Service: Orchestrate message handling, optional file processing, and context injection from client HTML, with logging and error handling.
LLM Provider Abstraction: Centralizes provider selection, model configuration, and error handling for LLM calls.
Logging and Configuration: Global logging level and logger factory enable consistent security event logging.
Section sources
The system integrates three major security flows:
Website Content Safety: HTML → Markdown → Prompt Injection Classification → Safety Decision
Agent Action Safety: Agent-generated JSON → Sanitization → Execution Guardrails
Agent Prompt Safety: System prompt and context injection guarded by normalized messages and provider configuration
Diagram sources
Prompt Injection Validator Implementation#
Purpose: Determine whether website content may contain prompt injection attempts that could manipulate LLM behavior.
Template Design: The prompt instructs the LLM to analyze content and respond with a simple boolean classification.
Validation Chain: The service composes a LangChain prompt with the LLM and extracts a single “true” or “false” response to decide safety.
Diagram sources
Section sources
Agent Prompt Safety and Message Normalization#
System Prompt: A controlled system message guides the agent’s behavior and credentials handling.
Message Normalization: Messages are normalized to ensure consistent content representation across roles.
Graph Execution: The compiled LangGraph workflow ensures deterministic agent behavior and consistent message handling.
Diagram sources
Section sources
Agent Action Sanitization and Mitigation Strategies#
JSON Action Plan Validation: Ensures presence of required fields, validates action types, and checks required parameters per action category.
Script Safety Checks: Detects potentially dangerous patterns in custom scripts to prevent unsafe execution.
Legacy JS Validation: Provides a secondary filter for legacy JS patterns.
Diagram sources
Section sources
Endpoint Security and Error Handling#
Website Validation Endpoint: Exposes a POST endpoint that delegates to the validator service and returns a safety decision.
Agent Endpoint: Validates request fields, invokes the agent service, and handles exceptions with logging and HTTP error responses.
Diagram sources
Section sources
Website Validator depends on:
HTML-to-Markdown conversion
Prompt template for injection detection
LLM client for classification
Agent Service depends on:
GraphBuilder for workflow compilation
LLM client for generation
Logging for security event capture
Agent Sanitizer is independent but integrates with agent action outputs.
LLM Provider abstraction centralizes provider selection and error handling.
Diagram sources
Section sources
Prompt Injection Detection: The validation chain performs a single LLM invocation per request; keep prompt concise and avoid excessive context to minimize latency.
Agent Workflows: Graph caching via LRU reduces repeated compilation overhead; ensure message normalization avoids unnecessary conversions.
Sanitization: Regex-based checks are linear in input size; keep action plans minimal and avoid overly complex scripts.
Logging: Configure appropriate log levels to balance observability and performance.
[No sources needed since this section provides general guidance]
Common issues and remediation steps:
Validation returns unexpected results:
Verify HTML input is well-formed and not empty.
Confirm the LLM provider is configured and reachable.
Check that the prompt template remains unchanged and the response format matches expectations.
Agent action plan errors:
Ensure required fields are present for each action type.
Review dangerous script patterns flagged by sanitization.
Validate that action types are part of the allowed set.
Endpoint failures:
Inspect router-level HTTP exceptions and service logs.
Confirm environment variables for API keys and base URLs are set.
Logging:
Adjust logging level via configuration and review logs for security events.
Section sources
The system employs a layered security approach: HTML-to-Markdown conversion and LLM-based classification for prompt injection detection, strict JSON action plan validation with script safety checks, and robust logging and error handling across endpoints and agent workflows. These measures collectively reduce the risk of prompt poisoning, jailbreaking, and malicious input exploitation while maintaining flexibility and performance.
[No sources needed since this section summarizes without analyzing specific files]
Validation Rules and Threat Modeling#
Prompt Injection Detection:
Input: Website HTML
Transformation: HTML → Markdown
Output: Boolean classification indicating safety
Agent Action Validation:
Required fields per action type
Allowed action categories
Script pattern scanning for dangerous constructs
Threat Modeling Approaches:
Principle of least privilege for actions
Separation of concerns between content parsing and safety decisions
Defensive logging and HTTP error surfacing
Section sources
Secure Prompt Construction Guidelines#
Keep system prompts concise and explicit about prohibited behaviors.
Avoid exposing internal instructions or model internals in user-facing prompts.
Use structured outputs (e.g., single-word classifications) to reduce ambiguity.
Inject only sanitized context and avoid raw user-provided HTML or scripts.
Section sources
Integration with the Broader Security Framework#
Logging: Centralized logger factory and environment-based log levels.
Providers: Unified LLM provider configuration with explicit error handling.
Endpoints: Clear separation of validation and agent execution routes.
Section sources